Skip to content

BUG: Index with duplicate labels raises ValueError in Dataframe.query #52224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

steliospetrakis02
Copy link
Contributor

@steliospetrakis02 steliospetrakis02 changed the title BUG: Index with duplicate labels raises ValueError in Dataframe.query #51815 BUG: Index with duplicate labels raises ValueError in Dataframe.query #52224 Mar 26, 2023
@steliospetrakis02 steliospetrakis02 changed the title BUG: Index with duplicate labels raises ValueError in Dataframe.query #52224 BUG: Index with duplicate labels raises ValueError in Dataframe.query Mar 26, 2023
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix should not require a new method on the DataFrame

@steliospetrakis02
Copy link
Contributor Author

Hey @mroeschke , just wanted to give you a quick update on my progress. I've integrated the latest version of my code into the 'query()' function, which should help streamline the overall process. Let me know if you have any questions or concerns.

@steliospetrakis02
Copy link
Contributor Author

Hello @mroeschke, I have made modifications to the query() function to check for duplicate index labels, but some tests have failed. As a new contributor to this open-source project, I would appreciate your expert opinion on whether I should make further changes to the code or focus on fixing the failing tests.

@mroeschke
Copy link
Member

As a new contributor to this open-source project

Honestly, I would suggest tackling an issue label good first issue first rather than this bug. A fix should ideally not fail existing tests and the solution should not involve falling back to a try/except

@steliospetrakis02
Copy link
Contributor Author

Hello @mroeschke , I'm not sure if you saw, but I have made changes to the implementation. It no longer requires falling back to a try/except. Instead, I added an if statement to check for duplicate indexes.It covers all the constraints that you have informed me of, and it is currently working.

if self.index.duplicated().any():
engine='numexpr'
# Create a copy of the dataframe with a unique index to avoid reindexing errors
unique_index = RangeIndex(len(self.index))
df_copy = self.copy()
df_copy.index = unique_index

        # Filter the copied dataframe
        filtered_df = df_copy.query(expr, engine=engine)

        # Map the filtered index back to the original index labels
        index_mapping = dict(zip(unique_index, self.index))
        filtered_df.index = filtered_df.index.map(index_mapping)
        
        return filtered_df

I apologize for the frequent comments, I don't mean to be a bother.

@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2023

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 3, 2023
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Index with duplicate labels raises ValueError in Dataframe.query
2 participants